12 research outputs found
Hierarchical Imitation Learning for Stochastic Environments
Many applications of imitation learning require the agent to generate the
full distribution of behaviour observed in the training data. For example, to
evaluate the safety of autonomous vehicles in simulation, accurate and diverse
behaviour models of other road users are paramount. Existing methods that
improve this distributional realism typically rely on hierarchical policies.
These condition the policy on types such as goals or personas that give rise to
multi-modal behaviour. However, such methods are often inappropriate for
stochastic environments where the agent must also react to external factors:
because agent types are inferred from the observed future trajectory during
training, these environments require that the contributions of internal and
external factors to the agent behaviour are disentangled and only internal
factors, i.e., those under the agent's control, are encoded in the type.
Encoding future information about external factors leads to inappropriate agent
reactions during testing, when the future is unknown and types must be drawn
independently from the actual future. We formalize this challenge as
distribution shift in the conditional distribution of agent types under
environmental stochasticity. We propose Robust Type Conditioning (RTC), which
eliminates this shift with adversarial training under randomly sampled types.
Experiments on two domains, including the large-scale Waymo Open Motion
Dataset, show improved distributional realism while maintaining or improving
task performance compared to state-of-the-art baselines.Comment: Published at IROS'2
VizWiz
The lack of access to visual information like text labels, icons,and colors can cause frustration and decrease independence for blind people. Current access technology uses automatic approaches to address some problems in this space, but the technology is error-prone, limited in scope, and quite expensive. In this paper, we introduce VizWiz, a talking application for mobile phones that offers a new alternative to answering visual questions in nearly real-time—asking multiple people on the web. To support answering questions quickly, we introduce a general approach for intelligently recruiting human workers in advance called quikTurkit so that workers are available when new questions arrive. A field deployment with 11 blind participants illustrates that blind people can effectively use VizWiz to cheaply answer questions in their everyday lives, highlighting issues that automatic approaches will need to address to be useful. Finally, we illustrate the potential of using VizWiz as part of the participatory design of advanced tools by using it to build and evaluate VizWiz::LocateIt, an interactive mobile tool that helps blind people solve general visual search problems
Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous Driving Research
Simulation is an essential tool to develop and benchmark autonomous vehicle
planning software in a safe and cost-effective manner. However, realistic
simulation requires accurate modeling of nuanced and complex multi-agent
interactive behaviors. To address these challenges, we introduce Waymax, a new
data-driven simulator for autonomous driving in multi-agent scenes, designed
for large-scale simulation and testing. Waymax uses publicly-released,
real-world driving data (e.g., the Waymo Open Motion Dataset) to initialize or
play back a diverse set of multi-agent simulated scenarios. It runs entirely on
hardware accelerators such as TPUs/GPUs and supports in-graph simulation for
training, making it suitable for modern large-scale, distributed machine
learning workflows. To support online training and evaluation, Waymax includes
several learned and hard-coded behavior models that allow for realistic
interaction within simulation. To supplement Waymax, we benchmark a suite of
popular imitation and reinforcement learning algorithms with ablation studies
on different design decisions, where we highlight the effectiveness of routes
as guidance for planning agents and the ability of RL to overfit against
simulated agents
Using FPGAs to perform embedded image registration
Image registration is the process of relating the intensity values of one image to another image using their pixel c~?tent alone. An example use of this technique is to create panoramas from individual images taken froin a rotating camera. A class of image registration algorithms, known as direct registration methods, uses intensity derivatives to iteratively estimate the parameters modeling the transformation between the images. Direct methods are known for their sub-pixel accurate results; however, their execution is computationally expensive, often times preventing use in an embedded capacity like those encountered in small UIUllann~d aerial vehicle or mobile phone applications. In this work, a high performance FPGA-based direct affine image registration core is presented. The proposed method combines two features: a fully pipelined architecture to compute the linear system of equations, and a Gaussian elimination module, implemented as a finite state machine, to solve the resulting linear system. The design is implemented on a Xilinx ML506 development board featuring a Virtex-5 SX50 FPGA, zero bus turn-around (ZBT) RAM, and VGA input. Experimentation is performed on both real and synthetic data. The registration core performs in excess of 80 frames per second on 640x480 images using one registration iteration
Automatically tuning background subtraction parameters using particle swarm optimization
A common trait of background subtraction algorithms is that they have learning rates, thresholds, and initial values that are hand-tuned for a scenario in order to produce the desired subtraction result; however, the need to tune these parameters makes it difficult to use stateof-the-art methods, fuse multiple methods, and choose an algorithm based on the current application as it requires the end-user to become proficient in tuning a new parameter set. The proposed solution is to automate this task by using a Particle Swarm Optimization (PSO) algorithm to maximize a fitness function compared to provided ground-truth images. The fitness function used is the Fmeasure, which is the harmonic mean of recall and precision. This method reduces the total pixel error of the Mixture of Gaussians background subtraction algorithm by more than 50 % on the diverse Wallflower data-set
Analyzing Team Actions With Cascading Hmm
While team action recognition has a relatively extended literature, less attention has been given to the detailed realtime analysis of the internal structure of the team actions. This includes recognizing the current state of the action, predicting the next state, recognizing deviations from the standard action model, and handling ambiguous cases. The underlying probabilistic reasoning model has a major impact on the type of data it can extract, its accuracy, and the computational cost of the reasoning process. In this paper we are using Cascading Hidden Markov Models (CHMM) to analyze Bounding Overwatch, an important team action in military tactics. The team action is represented in the CHMM as a plan tree. Starting from real-world recorded data, we identify the sub- teams through clustering and extract team oriented discrete features. In an experimental study, we investigate whether the better scalability and the more structured information provided by the CHMM comes with an unacceptable cost in accuracy. We find that a properly parametrized CHMM estimating the current goal chain of the Bounding Overwatch plan tree comes very close to a flat HMM estimating only the overall Bounding Overwatch state (a subset of the goal chain) at a respective overall state accuracy of 95% vs 98%, making the CHMM a good candidate for deployed systems. Copyright © 2009, Assocation for the Advancement of ArtdicaI Intelligence (www.aaai.org). All rights reserved
Person And Vehicle Tracking In Surveillance Video
This evaluation for person and vehicle tracking in surveillance presented some new challenges. The dataset was large and very high-quality, but with difficult scene properties involving illumination changes, unusual lighting conditions, and complicated occlusion of objects. Since this is a well-researched scenario [1], our submission was based primarily on our existing projects for automated object detection and tracking in surveillance. We also added several new features that are practical improvements for handling the difficulties of this dataset. © 2008 Springer-Verlag Berlin Heidelberg
VizWiz::LocateIt - enabling blind people to locate objects in their environment
Blind people face a number of challenges when interacting with their environments because so much information is encoded visually. Text is pervasively used to label objects, colors carry special significance, and items can easily become lost in surroundings that cannot be quickly scanned. Many tools seek to help blind people solve these problems by enabling them to query for additional information, such as color or text shown on the object. In this paper we argue that many useful problems may be better solved by direclty modeling them as search problems, and present a solution called VizWiz::LocateIt that directly supports this type of interaction.VizWiz::LocateIt enables blind people to take a picture and ask for assistance in finding a specific object. The request is first forwarded to remote workers who outline the object, enabling efficient and accurate automatic computer vision to guide users interactively from their existing cellphones. A two-stage algorithm is presented that uses this information to guide users to the appropriate object interactively from their phone. © 2010 IEEE